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The distribution of mutational fitness effects (DMFE) is crucial to the evolutionary fate of quasispecies. In 
this article we analyze the effect of the DMFE on the dynamics of a large quasispecies by means of a 
phenotypic version of the classic Eigen's model that incorporates beneficial, neutral, deleterious, and lethal 
mutations. By parameterizing the model with available experimental data on the DMFE of Vesicular 
stomatitis virus (VSV) and Tobacco etch virus (TEV), we found that increasing mutation does not totally 
push the entire viral quasispecies towards deleterious or lethal regions of the phenotypic sequence space. The 
probability of finding regions in the parameter space of the general model that results in a quasispecies only 
composed by lethal phenotypes is extremely small at equilibrium and in transient times. The implications of 
our findings can be extended to other scenarios, such as lethal mutagenesis or genomically unstable cancer, 
where increased mutagenesis has been suggested as a potential therapy. 



Eigen's model of molecular quasispecies 13 , initially conceived to explore prebiotic evolution, has played a key 
role in understanding the population dynamics and evolution of RNA viruses 4 7 . Due to the complexity of 
these pathogens, some theoretical assumptions and predictions fail to catch important properties of viral 
dynamics and adaptation 8 . For instance, assumptions made considering simple fitness landscapes generally 
predict the presence of transitions towards error threshold or extinction regimes as mutation rates increase 511 . 
However, biological evolution proceeds on more complex, rugged fitness landscapes 3,12,13 and, outstandingly, not 
all mutations exert the same effect on viral fitness 14,15 . 

Efforts have been made to expand the basic quasispecies theory by relaxing its original assumptions and 
incorporating more virus-realistic features: fitness landscapes with multiple peaks 16 , neutrality and robust- 
ness 17,18 , spatial effects 10,19-21 , complementation during coinfection 22,23 , and different modes of replication and 
epistasis 21,24 , among others. Despite all these modeling approaches, there is still a major factor that remains poorly 
explored by the standard quasispecies model: realistic distributions of mutational fitness effects (DMFE). DMFE 
may exert a strong impact in the evolutionary dynamics of a large quasispecies population. Therefore, our aim in 
this study is to provide a broader framework by incorporating experimentally available DMFE within quasis- 
pecies evolutionary dynamics. 

The fitness effects of mutations are central to evolution 25 27 . Several experiments quantifying the fraction of 
spontaneous mutations as either selectively beneficial, neutral, deleterious or lethal have been performed in, e.g., 
Arabidopsis thaliana 2 ", Drosophila me\anogaster 2SX ', bacteria 31 , and viruses 14,15,32,33 . The efforts made by virolo- 
gists in describing DMFE have not been mirrored by theoretical research in quasispecies populations. As yet, few 
models have explicitly considered the DMFE in asexual replicators 34,35 , and none of them have parameterized the 
standard quasispecies model with experimentally available data on DMFE. Furthermore, the impact of the DMFE 
on the error threshold and how they may promote or constrain the generation of deleterious or lethal phenotypes 
in a large quasispecies population (e.g., in persistently infected hosts) remains unknown. 

The error threshold is a theoretical average mutation rate that sets a maximum limit for maintenance of genetic 
information encoded by a replicating system 36 . Usually, the error threshold takes place when the pool of mutants 
displaces the wild-type sequence due to increased mutation. It is important to mention that the error threshold 
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implicitly depends on the fitness landscape 37 ; e.g., for the single-peak 
fitness landscape, the error threshold occurs when the homogeneous 
cloud of mutants with lower fitness outcompetes the wild-type 
sequence 1-2,1011 . In order to introduce a more realistic fitness land- 
scape into the quasispecies model, as well as to consider realistic 
DMFE that determine how mutation moves the quasispecies within 
this landscape, we analyze a phenotypic quasispecies mathematical 
model incorporating variable mutational fitness effects (see 
Figure 1). 

Here, we will first investigate the impact of increasing mutation 
rate on the phenotypic distributions and on the viability of the qua- 
sispecies integrating available experimental data on the DMFE for 
two RNA viruses: the animal pathogen Vesicular stomatitis virus 14 
and the plant pathogen Tobacco etch virus 15 . Second, we will invest- 
igate the dynamics for the general model with and without pheno- 
typic reversion, since we are especially interested on a detailed 
analysis of the dependence of the results with respect to the model 
parameters. Finally, we perform an extensive search in the parameter 
space to evaluate the likeliness of finding parameter combinations 
that impair the success of viral phenotypes, driving the quasispecies 
towards lethal and deleterious vertices of the phenotypic sequence 
space. 

Results 

We refer the reader to the Model and Methods section and to Section 
1 in the Supplementary Information for a detailed description of the 
mathematical model and, in particular, for the notation and meaning 
of variables and parameters that we use in what follows. 

Dynamics with realistic DMFE. First, we present results on the 
model Eqs. (1) incorporating DMFE data for VSV and TEV and 
considering that phenotypic reversions (via backward or 
compensatory mutations) are not possible. Previous theoretical 
work has assumed that the likelihood of backward mutations was 
extremely small due to the length of RNA viral genomes 101 "' 22 , while 
other models have incorporated backward mutations in the 
dynamics of quasispecies 16,38 . The case with no backward 



mutations (no phenotypic reversion) can be studied in our model 
by setting 8 = 0 (being 5 the probability of phenotypic reversion in 
the quasispecies, see Model and Methods section). Under our model 
assumptions, together with 3 = 0 and considering no production of 
beneficial mutations i.e., ?. B 0 (see Model section), Eqs. (1) have 
two equilibrium points (given as Q 3 and Q 4 in Section S4 of 
Supplementary Information). When no phenotypic reversion is 
allowed (i.e., 8 = 0), the relevant fixed point for VSV data is given 
by Q 3 (see SI Section S4). This fixed point, which is an attractor under 
the parameter values for VSV, can be represented for the sake of 
clarity as: (x£ = 0, x 1100 , X* m , %* 101 , %J ln ), where keH r \{1100, 
1110, 1101, 1111\}, and 

x iioo = 5 b(1-A')7 IF ' 

*mo = i 1 +Sb)A d [i[1 -/i)/*P, 

*noi=^./' s D(l-A0/ l r\ 

*mi =^U^-s D + s B )k D j.i/ x ¥, 

with ¥ = s D (l - n) 2 + [(1 + s B )X D n + SjA,"] (1 - H) + (1 - s D + 
s B )?. D A L [i 2 . Here s D and s B are the selection coefficients tied to 
deleterious and beneficial mutations, fi is mutation rate, and a d 
and X L are the frequencies of production of deleterious and lethal 
phenotypes during the process of replication and mutation (see 
Model and Methods section for further details on the parameters 
of the model). The previous equilibrium point involves the 
persistence of four different sequences: x* 100 , x* no , and the lethals 
x 1101 and Xj ln . That is, when <5 = 0 and ?. B > 0 (i.e., no phenotypic 
reversion for VSV), 12 of the 16 different sequences of the 
quasispecies will asymptotically achieve extinction, and four types 
of sequences will only compose the quasispecies: a neutral mutant 
with the beneficial phenotype (xj 100 ) and the sequence with neutral, 
beneficial, and deleterious mutations (xj 110 ). The other two 
sequences have the lethal phenotype (i.e., jc* 101 and %* m ). These 
results perfectly match with the time series shown in Figure SI. 
Moreover, projections of the dynamics in the simplex suggest that 




Figure 1 | Population structure of the phenotypic quasispecies. Sequences can carry beneficial (B), neutral (N), deleterious (D), and lethal (I) mutations 
(bit 0 means non-mutated, and bit 1 means mutated), (a) Phenotypic sequence space where ball size indicates strings' replicative fitness. The inner green 
cube contains the lethal mutants. The fitness of some strings is shown on a fitness landscape with dashed arrows. The landscape has three different peaks 
available through mutation: wild-type ( W), beneficial (B), and deleterious (D) peaks, while the ground (light blue) indicates the lethal (I) phenotype with 
zero replication. Strings x 1010 and x nl0 will occupy peaks W, B, or D, depending on the values of the selection coefficients s B and s D . (b) Example of 
replication with mutation for string i 1I10 , with replication rate A n io = 1 + % — s D . Gray arrows denote mutation transitions, (c) Sequences of the 
quasispecies (1st column), their phenotypes (2nd column), and their fitnesses determined by the replication rates, A,- (3rd column). 
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such a fixed point is globally stable (Figure S2), which has been 
confirmed analytically in SI Section S4. 

The effect of increasing mutation involves the extinction of the 
non-mutated genomes at equilibrium (Figure 2). Such extinction is 
mainly due to outcompetition by the beneficial phenotype x 1100 , 
which is present in the quasispecies in a wide range of mutation rates, 
independently of the values of the selection coefficients s B and s D 
[Figure 2(a) and (c)]. Two of the sequences found at equilibrium are 
lethal mutants, but the quasispecies for VSV is mainly dominated by 
the neutral mutants carrying a beneficial mutation and by sequence 
*mo m a wide range of mutation rates. The case with s B = 0.25 and 
So = 0.9 actually makes the sequence x noo much more dominant in a 
large range of mutational values [Figure2(c)]. For more details on the 
dynamics using VSV DMFE data see SI Section S4. 

A qualitatively different dynamics is found for the DMFE data for 
TEV without phenotypic reversions. Here A B = 0 gives rise to a 
different fixed point given by Qi (see SI Section S4). For this case, 
the equilibrium also consists of four sequences: Xq 100 , ^o 110 > an d the 
lethals and xjj ui . Similarly to VSV, the dominant sequences at 
increasing mutation are the sequences with only neutral and with 
both neutral and deleterious mutations. Lethal sequences become 
abundant only when mutation rate is extremely high [Figure 2(b)]. 
Some examples of the time dynamics and phase portraits for TEV 
DMFE data can be found in Figs. S3 and S4. 

Interestingly, our results reveal that the fittest phenotype does not 
go extinct as we might expect. The usual error threshold sets in when 
H = 1 — [w un fi t lwf it ] , where w un f it and Wju are, respectively, the lowest 
and the highest fitnesses of the sequences 3 '. For example, according 
to this prediction, we would expect the extinction of the sequence 
1100 for VSV when \i > fj. c = 1 - [(1 - 0.25)/l] = 0.25. However, 
the panels (a) and (c) of Figure 2 show persistence of 1100 at this 
mutation rate and higher. Hence, the simple expectation previously 
defined fails. The reason for this phenomenon may be due to the fact 
that the unfit subpopulation ( 1 1 1 0) is constantly loosing members to 




mutation rate, ]X 



Figure 2 | Impact of increasing mutation rate in the population equilibria 
without phenotypic reversion. Here we use 6 = 0 (i.e., no phenotypic 
reversion) and DMFE data for Vesicular stomatitis virus (VSV) 14 and 
Tobacco etch virus (TEV) 15 . In (a) and (c), we display the equilibrium 
populations for VSV: x noo (black); x nl0 (red); x nol (green); and x nll 
(blue). In (b) we show equilibrium populations for TEV: Xq WQ (violet); 
Xbno (cyan); ^) 101 (maroon); and Xq IU (dark green). For both VSV and 
TEV, all other variables have zero population numbers at equilibrium. We 
used: (a) s B = 1 > s D = 0.25; (b) s B = s D = 0.5; and (c) s B = 0.25 <s D = 0.9. 



the lethal class (1111), and then w un r lt should be weighted in the 
previous calculation to include these lethal members, which might 
prevent the error catastrophe. 

When phenotypic reversions are allowed, i.e., <5 0, the non- 
mutated strings (i.e., wild-type string x 0 ooo) also decrease their popu- 
lation density dramatically as mutation is slightly increased. This 
finding is common for both VSV and TEV DMFE data. However, 
such a population is maintained by keeping its equilibrium in small 
populations for VSV DMFE data, i.e., ;Cq 000 < 10~ 7 , for the whole 
range of mutation rates (see Figure 3 and panels (d-f) in Figure 
SI). This effect also takes place for different values of the selection 
coefficients but is less dramatic for TEV data, where Xq 000 <10" j 
(Figure 3 and panels (d-f) in Figure S3). A plausible explanation 
for this effect is that as TEV does not produce beneficial mutants, 
competition is weaker and thus the wild-type phenotype can persist 
even for high mutation rates. For VSV, however, the equilibrium 
populations of the strings with the beneficial phenotype (i.e., 
sequences Xiooo.noo) undergo a dramatic increase for very low muta- 
tion rates in all the scenarios analyzed in Figure 3. Typically, such 
sequences decrease their population numbers at increasing mutation 
rates (see also Figure S5). Hence, for both VSV and TEV examples, 
when phenotypic reversions are allowed, no error thresholds exist at 
which the wild-type sequence (and its neutral mutants) disappear 
from the population. For the particular case considering phenotypic 
reversion, explicit analytical approximations can be derived assum- 
ing that the probability of phenotypic reversion is small (i.e., <5 2 0). 
We refer the reader to Supplementary Information Sections S5 and 
S6 for detailed calculations. 

We notice that for VSV there is an available beneficial mutation, 
and since reversions are rare, both mutational and selective pressures 
favor it, and all sequences at equilibrium have 1 in the first position. 
Similarly, the neutral bit is favored by mutation and neutral selection; 
thus every sequence has it as well. For TEV the situation is similar, 
but since there are no beneficial mutations, the first bit is always 0. In 
either case, this is essentially a 2-locus problem. 

In the previous analyses we have considered experimental DMFE 
to characterize the dynamics of Eq. (1). Since the proportion of 
mutations that are effectively neutral, beneficial, and deleterious 
may vary for other species, we provide analytical and numerical 
results of the model as a function of all parameters, including arbit- 
rary values of the DMFE, given by k^. These analyses can be found in 
Supplementary Information Sections S4, S5, S6, and S7 (see also next 
sections). 

Exploration of the parameter space. The previous results without 
phenotypic reversion reveal the existence of error thresholds shifting 
the quasispecies towards beneficial, neutral, deleterious, and lethal 
regions of the phenotypic sequence space in an asymmetric manner. 
The error threshold has been discussed as a phenomenon causing the 
loss of viral genetic information due to increased mutation 19 . 
Actually, the error threshold is an evolutionary transition in 
sequence space that can delay or even prevent extinction by 
moving the population towards genotypes that are robust to 
extinction 40 , as our results indicate, i.e., we characterized 
persistence of beneficial and neutral phenotypes at equilibrium for 
a wide range of mutation rates (see Figures 2 and 3). 

In this context, our theoretical approach allows us to address the 
following question: what is the likelihood of finding scenarios in 
which genomes fail to replicate and thus the viral population is only 
composed of deleterious and/or lethal phenotypes? To explore this 
question, let us define three different scenarios given by a quasispe- 
cies with different equilibrium population values, N*, as follows: 

• Scenario (A): N* lethal mutants, Xj=N*, i.e., sum of the 
populations of lethal sequences. 

• Scenario (B): N* lethal plus deleterious mutants, ^J. L[ j D x i = 
N*, i.e., sum of scenarios (A) and (C). 
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mutation rate, u mutation rate, u mutation rate, u mutation rate, u 

Figure 3 | Population equilibria for sequences and phenotypes at increasing mutation considering phenotypic reversion. We here consider phenotypic 
reversion (S = 10~ 3 ) and DMFE data for VSV and TEV (in linear-log scale) with: (a) s B = s D = 0.5 and (b) s B = 0.25 < s D = 0.9. In the first row we display 
sequences: Xq Q00 (black line), and the pool of mutants in red lines (except for strings Xoioo (dotted red line), x 1000 (dashed red line); and x noo (blue line)). 
For clarity, phenotypes in the second row are grouped as follows: sequences ^) 0 oo pl us ^aioo (wild-type, W, in black); x 1000 plus x 1100 (beneficial, B, in 
green); xboio P ms -"bno (deleterious, D, in blue); and lethal (I, in red) with the sum of all sequences with bit 1 in the last position; and x ww plus x luo 
(sequences with beneficial and deleterious mutations, in violet) . Sequences Xioio and Xi 1 1 0 can be included in phenotypic classes W, B, or D, depending on 
the values of the selection coefficients [i.e., s D = s B ( W), s D < s B (B), and s D > s B (D)]. 



• Scenario (C): N* deleterious mutants, ^ ^Xj =N", i.e., sum of 
the sequences with the deleterious phenotypes, and, if s D > s B , 
adding also the sequences with both deleterious and beneficial 
mutations with fitness A = 1 + s B — s D ). 

All our previous analyses using VSV and TEV data suggest that the 
possibility to push the entire quasispecies towards lethality by 
increasing mutation (scenario (A) with N* = 1 ) may be very unlikely 
due to variable DMFE. To test the generality of this result, we per- 
form an extensive search in the parameter space of Eqs. (1), which 
allows us to identify those parameter combinations fulfilling scen- 
arios (A), (B), and (C) for different values of N*. To do so, we use a 
MonteCarlo (MC) algorithm to randomly sample the parameter 
space of Eqs. (1) (see Model and Methods section). The results 
obtained without phenotypic reversion are shown in Table I and in 
Figure 4. We found that the mean percentage of parameter combina- 
tions, (p*) • 100, that push the quasispecies towards lethal vertices of 
the phenotypic space is extremely small (see parameter spaces (a l , fi) 
and (n, X B ) for scenario (A) in Figure 4). For instance, such a per- 
centage is about 1.88% using N* s 0.9. As expected, the parameter 
values responsible of increasing populations of lethal sequences are 
mainly those combinations involving extremely high mutation rates, 
although such mutation can diminish at increasing values of At. 
However, as we display in the histogram of Figure 4, the likelihood 
to push the whole quasispecies towards lethality is very small. 
Specifically, the histogram displays the mean percentage of para- 
meter combinations for each value of N* and scenario, averaged over 
100 independent replicas, where each replica corresponds to M = 10 6 
iterations of the MC algorithm. 

Our results also reveal that when beneficial mutations are less 
common, mutation rate does not need to be so high to produce some 
fraction of the population composed by lethal sequences, although 
it is also very unlikely to have a quasispecies fully composed by 
lethal sequences (it simply will not replicate). We also found that 
the selection coefficients have a little effect in scenario (A), since no 
clear pattern is shown for the different values of N* in the parameter 
space projection given by (s D , s B ) that is displayed in Figure 4. 



Concerning to scenario (B), where we put together lethal and 
deleterious genomes, the percentages of parameter combinations 
pushing the whole quasispecies to this scenario is larger than for 
scenario (A). For instance, the probability of finding parameter com- 
binations with a quasispecies composed by N* s 0.9 lethal plus 
deleterious sequences is about 6.8%, which is still a low value. 
Here, as we identified in the previous analyses, selection coefficients 
do not play an important role in this scenario. Finally, the percentage 
of parameter combinations pushing the quasispecies to deleterious 
nodes in the phenotypic sequence space is also extremely low. We 
may notice also that having a population dominated by deleterious 
mutants does not imply that viral sequences will not be able to 
replicate, since some of the deleterious sequences will be able to 
replicate whenever s D is small. For instance, a sequence with the 
deleterious phenotype will be able to replicate at a rate A = 0.8 when 
s D = 0.2. 

The exploration of the parameter space considering phenotypic 
reversion reveals that the percentage of parameter combinations 
driving to lethality decreases even more (see Supplementary 
Information Section S7 for further details). All together, the previous 
results suggest that the quasispecies is very unlikely to be driven 
toward full lethality by manipulating the parameters of the model, 
including mutation rates (see Figures 2-4, Supplementary 
Information Section S7, and Figures S5-S7). We performed an 
ANOVA test with the data displayed in Table I, taking "scenario" 
and "phenotypic reversion" as fixed factors, and "N*" as a variable 
and covariable. Factor "scenario" had a significant effect (P < 
0.0001) with the following rank: (B) > (A) > (C). In other words, 
(p*) was larger in scenario (B) than in (C), with (A) occupying an 
intermediate position. "Phenotypic reversion" had a significant 
effect (P = 0.0165), with lower values of (p*) when phenotypic 
reversion was allowed and higher when it was not allowed, meaning 
that phenotypic reversion actually makes lethality much more dif- 
ficult. The covariable "N*" also had a significant effect (P < 0.0001), 
with the largest values of (p*) when N* is lower, indicating that the 
probability of having the majority of sequences in lethal vertices was 
small in this case. Of all interactions between the factors and the 
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Table I | Statistics obtained from the MonteCarlo (MC) exploration of the parameter space of Eqs. (1 ). For S = 0 (no phenotypic reversion) 
and for 8 > 0 (phenotypic reversion), we computed the mean percentage (±SD) of parameter combinations, (p*) • 100, pushing the 
quasispecies towards scenarios (A), (B), and (C) for different population equilibria, N*. Each data value was averaged over 100 
independent replicas, and, for each replica, we ran M = 1 0 6 MC iterations. The same data represented in histograms for S = 0 and S > 
0 are displayed, respectively, in Figure 4 and Figure S6(b) 



No phenotypic reversion (<5 = 0) Phenotypic reversion (S > 0) 



Scenario 


>N* 


<p*> ■ 1 00 


±SD 


<p*> • 1 00 


±SD 


(A) 


2:0.6 


10.7471 


0.0031 


7.3394 


0.0272 


2:0.7 


7.1483 


0.0026 


4.0553 


0.0212 




2:0.8 


4.2387 


0.0019 


1 .7490 


0.0124 




2:0.9 


1.8815 


0.0012 


0.3808 


0.0064 


(B) 


2:0.6 


23.7100 


0.0039 


20.27 


0.0422 


2:0.7 


17.5503 


0.0036 


14.161 


0.0352 




2:0.8 


12.1097 


0.0031 


8.544 


0.0264 




2:0.9 


6.8064 


0.0024 


3.478 


0.0183 


(C) 


2:0.6 


3.3089 


0.0018 


3.0518 


0.0154 


2:0.7 


1.4216 


0.0012 


1.148 


0.0103 




2:0.8 


0.4308 


0.0006 


0.2547 


0.0048 




2:0.9 


0.0572 


0.0002 


0.0145 


0.001 1 



covariable, the only significant one was "scenario" X "N*" (P < 
0.0001), meaning that differences between scenarios are bigger for 
small N* than for large values of N*. 

The previous estimations of the percentages of parameter combi- 
nations pushing the quasispecies towards population values N* for 
scenarios (A), (B), and (C) were computed at equilibrium. But, how 
do these probabilities behave in transient time? To answer this ques- 
tion, we performed similar analyses forN(f) instead ofJV*. That is, we 
used the MC algorithm to quantify the fraction of parameter combi- 
nations involving different population values for scenarios (A), (B), 
and (C) at a given time, t (we used t = 10 2 , t = 10 3 , and t = 10 4 ). This 
strategy allows us to explore the likelihood of these scenarios taking 
place also during transients. The results, displayed in Supplementary 
Information Table SI, revealed that such percentages are indeed 
smaller during transients, especially when 8 = 0. For instance, the 
probability of finding parameter combinations that push the quasis- 
pecies towards scenario (A) considering N(t = 100) s 0.9, is (p(t)) ~ 
0.0066, which is approximately 3-fold lower than the same probabil- 
ity computed at equilibrium. As expected, such probabilities get close 
to the probabilities computed at equilibrium for larger values of time. 
Similar results were found for scenarios (B) and (C) (see 
Supplementary Information Table SI). 

The ANOVA of data from Table SI, found a net effect of time (P < 
0.0001): time has a different behavior for each scenario (P < 0.0001). 
For scenario (A), (p(t)) for the first evaluated time (i.e., t = 10 2 ) is 
significantly lower than for the other two scenarios, which remain the 
same. In scenario (B), this difference was even smaller, and in (C) this 
difference was not found. These data indicate that for t = 10 2 the 
convergence to equilibrium in scenario (A) has not been reached and 
thus the system is in the transient state, but for scenarios (B) and (C) 
the system is almost at equilibrium. We notice that the values of (p(t)) 
displayed in Table SI, are typically lower in transient time, meaning 
that the probability of pushing the quasispecies toward lethal vertices 
of the phenotypic sequence space is smaller during the transient time, 
although such values closer to equilibrium also remain very small, 
specially for large values of N(t). 

Transient regimes as a function of S and other parameters. We are 

particularly interested in the effect of the probability of phenotypic 
reversion, <5, on transient times (see Model and Methods section). 
Our results indicate that the smaller 8 the longer is the time needed to 
approach the fixed point at the given distance r\. This time depends 
on two things: the component of the initial data in the eigenvector of 
the matrix A in the direction of x» and on the eigenvalue of DP(x») 



closest to zero. As discussed in Supplementary Information Section 
S5, the smaller is 8 the closest is the eigenvalue to zero and, hence, the 
required time is larger. 

It is worth to remark that the fraction of M* for which the approxi- 
mation to distance rj to x* is not produced at the final integration time 
fmax is extremely small: around 0.00002 for i5 = 10~ 3 and increases to 
0.00022 for 8 = 0. 

Figure S8(left) shows the results using M* = 10 8 and r\ = 10~ 3 for 
the values 8 = 10~ 3 , 10~ 4 , 10~ 5 , 10~ 6 , 10~ 7 , and 8 = 0, from left to 
right. On the horizontal axis x is represented in log 10 scale. The value 
of t max is set equal to 10 s . 

What happens if r\ is reduced? Of course, the transient time, until 
the distance to x* is less than will increase. To illustrate this fact we 
reproduce in Figure S8 (right) the same curves in the left plot for 8 = 
10~ 3 , 10~ 5 and 0 for \\ = 10~ 3 (in red) and we show the corresponding 
curves for 17 = 10~ 6 in blue. We see that in the first two cases the 
curves shift a little to the right, while in the 8 = 0 case they shift by a 
large amount. The computations for if = 10~ 6 use again f max = 10 s 
but M* is reduced to 10 s . 

Now, while for <5 = 10~ 3 still only a small fraction, around 0.00005, 
of cases do not approach x* at the distance f; at the maximum time, 
this fraction increases dramatically to 0.076 for 8 = 0. 

According to the theory developed in Supplementary Information 
Sections S4 and S5, if 8 > 0 is small the final approach to x» is of the 
form exp( — y8 112 ) where y is the coefficient of 8 1 ' 2 in Eq. (33) of the 
Supplementary Information. On the other hand, for <5 = 0 the 
approach to x* is like a constant divided by t. 

From the data which allow to produce Figure S8 (right), one can 
compute, by Lagrange interpolation, the values of x for different 
values of (p(x). We have selected values of <p(x) of the form 0.1 k, k 
= 1, . . ., 9. Let x 1 be the value of x when f\ = r\i = 10~ 3 and x 2 the one 
for if = r) 2 = 10~ 6 . For 8 > 0 the decrease of should satisfy, 
approximately, 

n 2 = i1 l exp(-y8 l/2 {x 2 -x l )y 

The value of y depends on the average value of the random para- 
meters and also on the value of <p(x). 

Hence, the decrease of 8 112 by a factor of 10 should be compensated 
by an increase of x 2 — x l also by a factor of 10. This is shown in Figure 
S9.The red and blue curves showlog 10 (T 2 — Ti) for different values of 
(p{x). The differences are very close to 1, as expected. On the opposite 
case, 8 = 0, the green curve shows log^fe/fi)- It is almost exactly 
equal to 3, in perfect agreement with the fact that log 10 ()/ 2 /'7i) = — 3. 



SCIENTIFIC REPORTS | 4:4625 | DOI: 1 0. 1 038/srep04625 



5 



Scenario (A): lethal mutants 



Scenario (B): lethal + deleterious mutants 




Scenario (C): deleterious mutants 



ii i l 




Figure 4 | Exploration of the parameter space of Eqs. (1) without phenotypic reversion (i.e., S = 0). We plot two-dimensional parameter space 
projections using M= 10 6 iterations of the MonteCarlo (MC) algorithm displaying parameter values pushing the quasispecies towards scenarios (A), (B), 
and (C) for different population equilibria, N*: N* > 0.6 (cyan), N* > 0.7 (violet), N* > 0.8 (yellow), and N* > 0.9 (maroon). Note that N* a x* 
includes all values of y* > x* (e.g., N* > 0.6 includes all other N* values analyzed). The histogram displays the mean percentage of parameter 
combinations, (p*) 1 100, with (p*) = (c(N*)/M), fulfilling scenarios (A), (B), and (C) for the investigated values of N* (using the same colors for each 
value of N* previously described). Each bar was computed over 100 independent replicas of the MC algorithm (each replica run for M = 10" iterations), 
standard deviation (SD) not shown (see Table I below). 
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Summarizing, this analysis and numerical checks allow obtaining 
good indications on the rate of approach to the fixed point. 

Discussion 

In this article we have extended Eigen's quasispecies model using a 
phenotypic description of the quasispecies that considers variable 
distributions of mutational fitness effects (DMFE), which are known 
to be crucial to RNA virus evolution 25,26 . Although the DMFE has 
been experimentally quantified for several viruses 14,153233 , previous 
theoretical models 3435 have not deeply analyzed the effect of the 
DMFE in persistently infected hosts. Moreover, none of them 
included these experimental data in the standard quasispecies model. 
Here we intend to cover this gap, paying special attention to the 
interplay between the DMFE and mutation in possible deleterious 
or lethal scenarios in large viral quasispecies populations. The inter- 
est in characterizing the effects of artificially- increased mutation 
(mutagenesis) in viral replicators comes from the fundamental pre- 
diction of the classic quasispecies theory 1,2 : error-prone replicators 
have a limited tolerance to mutations, beyond which an error cata- 
strophes takes place and the genetic information is lost. Several mod- 
els have shown that when mutation rate is increased, the population 
moves towards lower fitness classes, and the average fitness of the 
quasispecies or the average growth rate per generation 
diminishes 39,40,42,43 . In such a case, for discrete systems, when the 
average growth rate becomes smaller than one, the population enters 
into an extinction regime that causes the disappearance of the 
quasispecies. 

Previous works on the induction of mutagenesis in poliovirus and 
VSV revealed a 100-fold decrease in viral titers 44,45 . Further experi- 
mental evidence confirming the effect of mutagens in decreasing 
infectivity and replication was later described for HIV-1 in tissue 
cultures 46 . Similar results were also found for Lymphocytic choriome- 
ningitis virus, hantaviruses, and Hepatitis C virus, among others (see 
review 47 ). Despite these previous investigations, the feasibility of 
entirely pushing a quasispecies towards lethality by manipulating 
external parameters such as mutation rates still remains a matter 
of debate, especially in persistently infected patients, i.e., in quasis- 
pecies with large population numbers. Our model, aimed to answer 
this question, predicted that an external forcing towards lethality 
(e.g., increased mutagenesis) would be unlikely to be found in large 
quasispecies populations. The generation of different mutational 
types seems enough to ensure the persistence of viable viral genomes, 
even if highly deleterious or lethal mutations are constantly 
produced. 

Our theoretical approach does not include other possible sources 
that could enhance viral extinction or involve significant viral fitness 
decrease. For instance, we consider soft selection, i.e., no degradation 
of sequences, because we are mainly interested in the net effects of 
mutation rate, DMFE, and phenotypic reversions in the viability 
and lethality of the viral phenotypes. Also, we are not considering 
discreteness (individual based model) or finite populations with sto- 
chasticity 41 (e.g., lethal defection 48 ), or bottlenecks during transmis- 
sion events that turn on Muller's ratchet (shown to operate in phages 
06 49 and MS2 50 VSV 51 , TEV 52 , Foot-and-mouth disease virus 53 , and 
human immunodeficiency virus type i 54 ). As mentioned, the standard 
quasispecies model (and thus our model as well) does not consider 
sources of stochasticity, suggested to be important in realistic experi- 
mental systems as well as during initial infections, when the number 
of viral particles can be extremely low. However, as we previously 
discussed, we considered a deterministic quasispecies model as a 
proxy to characterize the effect of the DMFE during a persistent 
infection, for which intrinsic or demographic noise of viral popula- 
tions might be negligible. Further research should include these fac- 
tors in quasispecies dynamics together with DMFE. 

Our model predicts that increasing mutation rates may not be 
enough to drive viral quasispecies towards highly deleterious or 



lethal phenotypes in large populations, as may be those typical of 
persistent infections. It would be interesting to perform experiments 
like those described in Refs. 14 and 15, in which the properties of the 
DMFE were evaluated, but at artificially increased mutation rate. 
Such experiments would facilitate the comprehension of the role 
played by the DMFE at increasing mutation in possible lethal scen- 
arios for viral quasispecies in vivo, also allowing to test our predic- 
tion. Moreover, these experiments would also give clues about the 
interplay between mutation and the DMFE, an important issue, that, 
to the extent of our knowledge, remains unknown. 

We notice that our results may also be relevant in the context of 
lethal mutagenesis. Lethal mutagenesis refers to a viral population's 
extinction through an excess of mutations, often promoted by muta- 
genic nucleotide analogs administered during viral replication 8,36,41 . 
Lethal mutagenesis is a demographic phenomenon that operates in 
finite populations, not considered in our model. Nonetheless, our 
model indicates that the chances to enter into a regime of declining 
viral populations may be very unlikely, since the variability in the 
DMFE will constantly keep available phenotypes able to successfully 
replicate in the population. 

As a final point, the theoretical framework and the results pre- 
sented here, developed for viral quasispecies, may also have implica- 
tions in carcinogenesis. In cancer, the maximum amount of genetic 
instability that can be tolerated by tumor cells, suggested to also 
behave as quasispecies, has been proposed 55,56 . Genomic instab- 
ility 57,58 is a hallmark of most malignant tumors, which could be 
ablated by increased mutagenesis 59 61 . Our model suggests that the 
success of such strategies should be weighted by the corresponding 
distribution of mutational fitness effects, which can be defined by 
taking into account the distinct roles played by genes affecting pro- 
liferation, DNA repair or apoptosis, along with house-keeping 62 . The 
potential success of suppression of tumor growth through mutation- 
induced lethality as well as the resistance of some clones to drugs 
should be considered as two faces of the unstable tumor dynamics, 
which may require considering the repertoire of mutational fitness 
effects. 

Methods 

Model. To analyze the impact of the DMFE on viral quasispecies we build a 
phenotypic mathematical model using Eigen's formulation 1 . The model describes the 
time dynamics of i classes of binary sequences with population numbers x { able to 
carry different mutational types that can replicate and mutate according to the 
differential equations: 

^ = (1 - fi)A iXi +fiJ2 aJT 1 J? M] M - 0>* ; , (1) 

(ih 

with i,j£H y , being TC V the configuration or sequence space (i.e., boolean v- 
dimensionalhypercube), with'H 1 ' — {1, . . . ,2 V }, and v = 4, resulting in a quasispecies 
composed by n — 16 different sequences. Each sequence has 4 bits, each of them 
corresponding to each particular type of mutation: beneficial (B, 1 st bit), neutral (JV, 
2 nd bit), deleterious (D, 3 rd bit), and lethal (I, 4 th bit), as we display in Figure 1. Our 
model becomes a generalization of the classic Eigen's quasispecies model 2 , which 
incorporates new parameters to introduce variance in fitness effects due to mutation 
and, eventually, phenotypic reversion. We refer the reader to Ref. 38 for a stochastic 
model similar to Eqs. (1). 

The first term of Eqs. (1) is the error-free replication of string i with replication rate 
Aj and mutation rate fx. The second term corresponds to the influx of mutant strings) 
from orthant neighbors {denoted as <j>;) of 7i v producing string i by mutation. 
Parameters > 0, with — 1- denote the fraction of mutations with a 

fitness effect k e [B, N, D, L} from sequence j to sequence i during error-prone 
replication. These parameters are used to incorporate the DMFE in our model {see 
Section Experimental data below). Parameter R^ i] is the probability of occurrence of 
transitions 0— » 1 and 1 — > 0 (i.e., phenotypic reversion) from sequence; to sequence i 
during replication, with: 

r . i f 1 if transition is 0—>l, 

I 0 < <5< 1 if transition is 1 -> 0. 

We notice that our approach allows us to model the process of mutation considering 
no backward mutations (i.e., no phenotypic reversion), setting S = 0; as well as to 
model phenotypic reversions due to backward or compensatory mutations (with 5 > 
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0). The total population is held constant (i.e., X% — 1) by a compensating 

dilution flux with rate coefficient O. The outflow term, <S>, is obtained under the 
constant population condition, i. e., x i — 1> giving: 



(fit 



(2) 



The state space of system (1) is a high -dimensional simplex (i.e., 2 V — 1 simplex) 
represented by the nonnegative vectors 

Ri+ — j(*oooo, ■ ■ • ,xnn)EM n \xi>Q,iETt v , ^jT^ Xj — l,i — 1, . . . ,n\. As initial condi- 
tions we will use (if not otherwise specified) x OOQO (0) — 1, setting all other variables to 
zero. 

All 2 4 different strings of the quasispecies can be mapped into a phenotypic space 
(fitness landscape) with four different fitness classes and three different peaks 
[Figure 1(a)]: 

• Peak (W) populated by sequences Xoooo.oioo (wild-type phenotype and the neutral 
mutants), with replicative fitness A 0 ooo,oioo = 1. 

• Peak (B) populated by sequences Xnoo.iooo (beneficial mutants plus the neutral 
mutants), with replication rate A 1000 j 100 — 1 + s B . 

• Peak (D) populated by sequences x 0 oio,oiio (deleterious mutant and the neutral 
mutants), with replicative fitness A 0 oio,oiio — 1 — s d- 

• The lowest replication value of the fitness landscape is given by the lethal phe- 
notypes, (L), given by sequences x a t, c i — 0, a, b, c e {0, 1}, with replication rate 
equal to zero. 

We refer the reader to Supplementary Information Section SI which contains all i 
equations obtained from (1). Sequences with both beneficial and deleterious muta- 
tions (given by Xooio.ono and with replication rates A 1010 ,ino = 1 + $b ~ s d) will 
populate fitness peaks (W), (B), or (D) depending on the values of the selection 
coefficients, s B and s D . Note all fitness classes include the neutral mutants. Equations 
(1) depend on the parameters p, s B , Sd> X b , X d , X l , X n , ana " <5- We set 0 < s B ^ 1, to 
ensure that beneficial mutations always increase replication rates. We also set 
0 < 5d < 1 (if not otherwise specified) to ensure that mutations are always deleterious, 
also avoiding a 100% decrease in fitness that would result in a lethal mutant (lethal 
mutants are explicitly introduced as variables in our dynamical system). Note that for 
s B > Sd, mutants x ltn0tl 1 10 are fitter than the wild-type string and will be considered as 
beneficial, while for s B < s D their fitnesses are lower than the wild-type fitness and 
thus will be considered as deleterious. For combined mutations (i.e., strings carrying 
two or more types of mutations) we assume additive fitness effects. Although epistatic 
interactions have been described for RNA viral genomes 63 , as a first approach we will 
here use additive fitness effects. Hence, the ranges for these parameters are: fi e (0, 1), 
s B G (0, 1], s D e [0, 1), a b > 0, X D > 0, A L > 0, X N > 0, l B + a d + X L + X N = 1, 5 > 0. 

It is important to notice that Eigen's model describes the dynamics of explicit genome 
sequences 2,3 ' 6 . We here simplify the system to sequences where mutations are not 
nucleotide substitutions but changes involving possible phenotypic transitions. Hence, 
our sequences are not viral genomes but an abstract representation of the genomes in 
terms of phenotypic traits (see Ref. 64 for other phenotypic approaches to viral qua- 
sispecies). In this sense, the relationship between our modeling approach and true viral 
genomes becomes obvious from the identification of groups of viral genomes that 
belong to the same fitness phenotypic classes (i.e, all have the same fitness regardless the 
specific mutations they carry in their genomes). These phenotypic classes are the entities 
represented in our model. Furthermore, keeping in mind our goal of exploring the role 
of the DMFE in quasispecies dynamics and transitions, we necessarily made several 
simplifying assumptions that allow for an analytical treatment. These assumptions and 
their limitations are discussed in Supplementary Information Section Si. 

Experimental data. In this article we use experimental DMFE data for VSV and TEV 
as a case study for viral quasispecies. Sanjuan et at 14 generated a collection of single- 
nucleotide substitution mutants of VSV and evaluated their fitnesses in vitro. They 
found that the frequencies of the four different fitness classes were (using our 



notation) X 



0.042; a 



0.2707; X\ 



0.2917; and X 



0.3956. Using a 



similar experimental approach but in this case measuring fitness effects in vivo, 
Carrasco et al. 15 found that such frequencies for TEV were X 



-0.227; 



0.364; and X% — 0.409. To simplify notation, in our model we will use X k 



D 

instead of ?J . We refer the reader to references 14, 15 for further details. 

Numerical algorithms. We are especially interested in performing an extensive 
exploration of the model parameters to address several issues. Our interests here are 
mainly two: 

a) Given arbitrary parameters of the model, we desire to compute the population 
values of beneficial, deleterious, and lethal sequences at the equilibrium. 
Furthermore, as typically the parameters are not well known or, simply, non 
available, we want to establish the probability that the fraction of sequences of 
each one of the previous types exceeds some fraction N* (being N* a given fixed 
population equilibrium value). For this later case, we are mainly interested in 
quantifying the likelihood of finding parametric scenarios impairing viral 
sequences success, i.e., parameter combinations generating deleterious and/ 



or lethal phenotypes. To do so, we will use the population scenarios (A), (B), 
and (C) (defined in the Results Section: Exploration of the parameter space). As 
mentioned, in all these scenarios we want to have a measure of the fraction of 
parameters for which these scenarios reach values of population size exceeding 
a given threshold, N*. The domain V where we locate the parameters is a cube 
[0, l] 4 corresponding to the values of \i, s B , s D , 3 times a three-dimensional 
simplex S3 in the parameters a b , X d , X l , X n with the constrain X B + a d + a l + 
X N = 1. Different assumptions can be made on the probability density in 
V = [0,l] 4 xX 3 . 

b) How fast go the initial conditions (typically the vector x(0) — (1, 0, . . ., 0) J ) to 
the attractor to which they tend? In other words, what can be said about 
transients? This is irrelevant if we assume that the evolution time is as large 
as desired. However, for finite time evolution, T, the distance from x(T) to the 
attractor can be large. 

Let us describe the algorithms to carry out the computations in both cases. 

In principle, if we consider a uniform probability density in V to answer question a) 
amounts to compute the volume of a semi algebraic set in the parameter space. 
Indeed, let us denote a value of the parameters simply as YleV. The amount of viral 
sequences of a given class corresponding to the equilibrium associated with FL can be 
denoted asg(Il). It is clear, due to the dependence of the matrix A (see Supplementary 
Inforamtion Section S4) and, hence, of the characteristic polynomial and the eigen- 
vectors, that g is an algebraic function. Therefore the volume of the region of interest 
in 2 is bounded by the hypersurface g(n) = N* and boundaries of S, 

However, even in the case that g is explicitly available, this is of little use because of 
the dimensionality. Moreover in the generic case g is not explicitly known. This 
suggests to use a MonteCarlo (MC) method to compute the volume of interest. The 
steps are as follows: 

• Using a (pseudo-) random number generator with uniform distribution in [0, 1], 
we generate four values to obtain a point in [0, l] 4 . Then, we generate points with 
uniform distribution in E 3 using a standard method. 

• When all the parameters are available we look for the dominant zero of the 
polynomial p B in Eq. (24) of the Supplementary Information. Let us denote it 
as (T 3 as in Supplementary Information Section S3. Then we proceed to the 
computation of the components of the fixed point using the formulas (25) and 
(26) of the Supplementary Information. 

• The process is repeated M times (M large, a typical value being M — 10 8 ). Then, 
for the different values of N* used in the study we count the number of cases ) 
in which g(Tt) ^ N*. The desired probability is p* = c(N*)/M. In many of our 
analyses we will represent such a probability as a percentage (i.e., meaning the 
percentage of parameter combinations driving the quasispecies towards a given 
scenario for a given value ofN*, if not otherwise specified). Alternatively, we can 
also be interested in transient times (see Results section), and perform a similar 
analysis considering a given population value for scenarios (A), (B), and (C) at a 
given time, N(t). Then, the previous probability will be represented as p(t) = 
c{N{t))/M. 

In a large part of the computations the parameter 3 is not taken with uniform 
probability but uniform in log scale, say in the range [di, 3 2 ] . Then the value of log(<5) 
is generated with a uniform distribution in [log^), log(<5 2 )] and proceed as before. 

It can be also interesting to fix 3 and proceed in a similar way for a set of values of 3. 
Then we can recover a probability function p(3) by interpolation. The desired global 
probability with respect to a density f(3) is obtained by integration and normalization 



L md6 ) L 



p(5)f(S)dS. 



(3) 



We pass now to the point b). Given parameters II we can compute the equilibrium x*. 
On the other hand, we can start the integration of system (3) of the SI from the initial 
point x(0) — (1,0, 0) r until the current point x(f) reaches a given distance n from 
x*. Typically we have used the value t\ — 10~ 3 and the distance has been computed 
using the | \^ norm. The integrations are carried out using the Runge-Kutta-Fehlberg 
RKF78 method, with automatic step size control and local relative tolerance 10~ 15 . 
After every step the distance to x* is checked. This allows obtaining the distribution of 
transient times and the effect of the different parameters on it, with special emphasis 
on the role of <5 as is illustrated below and in the Results section. 

Computation of transient times. In order to see how parameter 3, i.e., phenotypic 
reversion, affects the transients, we carry out the following experiment. For a fixed 
value of 3 we generate a random sample of M* sets of the remaining parameters, as 
described in Numerical algorithms subsection above. Then, for each set of 
parameters, the integration of Eq. (2) in the Supplementary Information is started 
with x(0) — (1, 0, 0) r . The integration is stopped when the distance to the fixed 
point is a small fixed value y\. The distance is measured as r\ = max, =1) _ _ il6 |x,(0 — 
(x*),-|- For a given value z of the time we record the fraction <p{z) of M* such that the 
distance n is reached for a value of t < %. Then we display <p(z) as a function of r (see 
Figure S8). Of course, for practical reasons, the integration time is limited to a fixed 
large f max . 
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